PaCMan : Parallel Corpus Management Workbench
نویسندگان
چکیده
We present a Parallel Corpora Management tool that aides parallel corpora generation for the task of Machine Translation (MT). It takes source and target text of a corpus for any language pair in text file format, or zip archives containing multiple corresponding text files. Then, it provides with a helpful interface to lexicographers for manual translation / validation, and gives out the corrected text files as output. It provides various dictionary references as help within the interface which increase the productivity and efficiency of a lexicographer. It also provides automatic translation of the source sentence using an integrated MT system. The tool interface includes a corpora management system which facilitates maintenance of parallel corpora by assigning roles such as manager, lexicographer etc. We have designed a novel tool that provides aides like references to various dictionary sources such as Wordnets, Shabdkosh, Wikitionary etc. We also provide manual word alignment correction which is visualized in the tool and can lead to its gamification in the future, thus, providing a valuable source of word / phrase alignments.
منابع مشابه
NATools - A statistical Word Aligner Workbench
This document presents the TerminUM project and the work done in its statistical word aligner workbench (NATools). It shows a variety of alignment methods for parallel corpora and discusses the resulting terminological dictionaries and their use: evaluation of sentence translations; construction of a multi-level navigation system for linguistic studies or statistical translations.
متن کاملDevelopment of a Corpus Workbench for the METU Turkish Corpus
We will introduce a corpus workbench designed and implemented for the METU Turkish Corpus. The workbench design introduces a number of useful features and the workbench itself is basically usable with any TEI and XML compliant corpus, provided that it can be indexed in the format required by the workbench.
متن کاملThe MATE Workbench –
The increasing variety and sophistication of spoken language dialogue systems (SLDSs) emphasises the need for tools in support of their development and evaluation as well as for appropriate evaluation criteria. In this paper we describe how the MATE workbench can be used during SLDSs development to efficiently produce corpus-based information on SLDSs and their components. The information retri...
متن کاملAdvances in the Witchcraft Workbench Project
The Workbench for Intelligent exploraTion of Human ComputeR conversaTions is a new platform-independent open-source workbench designed for the analysis, mining and management of large spoken dialogue system corpora. What makes Witchcraft unique is its ability to visualize the effect of classification and prediction models on ongoing system-user interactions. Witchcraft is now able to handle pre...
متن کاملDialogStudio: A workbench for data-driven spoken dialog system development and management
Recently, data-driven speech technologies have been widely used to build speech user interfaces. However, developing and managing data-driven spoken dialog systems are laborious and time consuming tasks. Spoken dialog systems have many components and their development and management involves numerous tasks such as preparing the corpus, training, testing and integrating each component for system...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014